Обновить

I2P Network Client Development Basics

Время на прочтение 8 min
Количество просмотров 21K
This article is intended for those who would like to develop their own I2P client from scratch. Familiarity with the basic concepts and concepts of I2P is assumed. At the moment, there is enough documentation and articles on this matter, including those translated into Russian. On the other hand, there is official documentation that describes the protocols and message formats quite well. Unfortunately, it is scattered, with many non-obvious things missing. This article was written primarily on the basis of studying and debugging the official I2P Java client. The ultimate goal is to implement it entirely in C++. The source code of the project in its current state is located on github.

Encryption used


To build your own I2P router, you must have the following encryption algorithms::
  1. ElGamal. Asymmetric encryption based on raising the base to a modulo power. The base and module are fixed constants for the entire I2P network. In addition to the standard block size of 514 bytes, custom block sizes of 512 bytes are also used.
  2. Diffie-Hellman to obtain the shared key of a symmetric encryption key by exchanging public keys. The same keys are used as for ElGamal.
  3. DSA for creating and verifying electronic signatures
  4. AES in two modes: CBC using an encryption key and initialization vector (IV), ECB for encrypting the IV itself, 16 bytes long
  5. SHA256 for calculating hashes
  6. Adler32 to calculate message checksum


Basic protocols


The I2P network consists of 4 main layers:
  • Transport layer. These are encrypted Internet connections TCP/IP or UDP. Includes connection establishment and encryption.
  • Tunnels. “Windows” of nodes to the outside world, located on other nodes and allowing one to hide their true location. They consist of a sequence of nodes interconnected by transport layer protocols. The tunnel can be simplified to think of as a chain of proxy servers to anonymize both the client and the server.
  • «Garlic". Transmission of messages or sequences between two end nodes via arbitrary routes and tunnels. Characterized by session identifiers and asymmetric, and, after establishing a session, symmetric encryption
  • Application layer protocols for transferring user data between nodes.

Each layer adds its own encryption for different purposes. Transport layer encryption hides traffic from the provider, tunnels - content and direction from intermediate tunnel nodes, "garlic" - from the final tunnel nodes when transmitting messages between tunnels.

Transport layer


In order to establish a transport layer connection, you need to know the IP address and port. There is a list of known nodes, called netDb, that changes during operation; information about new nodes comes from other nodes. Initially, the list of nodes is downloaded from special sites, the addresses of which are explicitly listed in the file router/networkdb/Reseeder.java. The protocol running on top of TCP/IP is called NTCP, and on top of UDP is called SSU. In addition to some differences in connection setup, SSU, due to its packet nature, supports breaking long messages into several fragments. The transmitted messages consist of a header, an I2PN message (more about the I2NP protocol below) and a checksum. A special message containing the current time is periodically transmitted for synchronization purposes. When a connection is established, the public keys of the routers are exchanged, on the basis of which, using the Diffie-Hellman algorithm, a common key for AES encryption is calculated, each on its own side.

Tunnels


Tunnels are always unidirectional - all messages can only be transmitted from the input node (Gateway) to the output node (Endpoint). Depending on which end of the tunnel belongs to its owner, who has all the information about the tunnel, tunnels are divided into incoming (the owner is the output node) and outgoing (the owner is the input node). The intermediate nodes of the tunnel do not know whether the tunnel is inbound or outbound, the only action carried out by the intermediate node is to encrypt the message with its encryption key and transmit it to the next node. An important consequence follows from this: the sequential decryption of tunnel messages must be carried out by its owner, since only the owner has the encryption keys of all intermediate nodes. This fact is quite trivial for incoming tunnels, i.e., having received a message, the exit node must sequentially decrypt it, however, for outgoing tunnels, the original unencrypted message must be sequentially decrypted before it is sent. Tunnels for which this node is not the owner are called transit tunnels. Transit tunnels carry foreign traffic and are necessary to support the functioning of the entire I2P network, thereby turning the node into a router. Tunnel nodes use AES encryption with three different keys: one is used to encrypt the node's response when creating the tunnel, and the other two are used to transmit data through the tunnel: one key encrypts the data itself, and the other encrypts the initialization vector (IV) to encrypt the data. In this case, the IV is encrypted with the same key twice: before encryption and after, this is called double encryption. The node receives these two keys in its tunnel creation message record, encrypted with its public key using ElGamal.
Inside tunnels, only TunnelData messages are transmitted, generally consisting of several fragments. The TunnelGateway message is used for transmission between tunnels. Although the official documentation says that for a two-way connection you need at least 4 tunnels (2 incoming and 2 outgoing), in fact it is not necessary to send messages through outgoing tunnels, but you can send a TunnelGateway message to the input node of the desired incoming tunnel.
In the TunnelData message, the checksum is calculated from the content data following the null byte and the unencrypted IV appended to it.

I2NP protocol


Data exchange within the I2P network occurs using I2NP messages of various types. Each message contains a header with its type and length, which allows you to define the boundaries between messages. Depending on the type, the message length can vary from 20 to 64K bytes. Each layer uses “wrapper” messages containing other I2NP messages from a higher layer. For tunnels, such “wrappers” are TunnelData messages for transmission within tunnels and TunnelGateway messages for transmission between tunnels. For “garlic” – Garlic. Most I2P traffic consists of the following nested messages:
Data->Garlic->TunnelData.
As a rule, messages are transmitted through tunnels, although they can also be transmitted directly between routers, in particular for the initial creation of new tunnels. Routers also exchange DatabaseStore messages immediately after establishing a connection. Messages between destinations should be sent via garlic, since the corresponding field is only present there.

Routers and destinations)


To work on an I2P network, you need an I2P client, which consists of a router that provides access to the I2P network and destinations for exchanging meaningful information. Information about routers, including their IP addresses, is publicly available; moreover, the current list of routers can be downloaded from special ftp sites. At the same time, information about the location of destination points is confidential. Information about destination points located on a given router is available only to this router; for all others, obtaining this information is not possible, which is one of the main mechanisms for ensuring anonymity of the I2P network.
Since routers are mainly located on the computers of network participants, their composition changes all the time. Therefore, routers are forced to constantly keep their list of other routers up to date. This process is called "probing" (exploratory), which consists of sending requests with a randomly selected 32-byte address to special routers called floodfill. It is assumed that floodfill routers have complete information about the network. Among other things, floodfill routers constantly communicate to each other information about new nodes found.
To request information about a node, the I2NP DatabaseLookup message is used, and the DatabaseStore information is used to transmit the information itself. Typically, messages are transmitted through tunnels, but the DatabaseStore is transmitted directly by the node at the transport level immediately after the connection is established, thereby informing the network of its existence. Otherwise, building tunnels for new nodes would be impossible.
DatabaseStore can contain two types of information: if this address corresponds to the RouterInfo structure, then the address is a router, and if LeaseSet, then the destination.
RouterInfo contains the public keys of the router, as well as a variety of service information, the most important of which are IP addresses, ports and supported transport protocols for the connection and information about whether the given router is a floodfill or not. Since RouterInfo can contain quite a lot of text information, it is transmitted gzipped.
LeaseSet, contains a list of incoming tunnels for a given destination, as well as a public key for encrypting garlic messages destined for this destination.

Application Layer Services


Let's consider the meaningful actions of the I2P client: anonymous hosting of online resources, and, accordingly, access to them. First, let's try to get data from some website, for example, Flibusta. At the moment we only have a 32-byte hash of its I2P address, our goal is to send an HTTP request and receive a response.
Of course, there is no router with such an address in the database (otherwise the IP address of the resource would be visible to everyone), so the only way to send a request is some incoming tunnel of the desired node that exists at a given time, for which you must first request and receive a LeaseSet. Unlike RouterInfo, which can be requested and received from a neighbor at the transport layer, LeaseSet can only be requested and received through tunnels that must first be built. This leads to a disappointing conclusion that it will not be possible to use an I2P network “on demand”; the I2P router must be running and must constantly be engaged in building and maintaining tunnels. Due to the decentralized nature of the network, building tunnels is a very difficult task - most attempts to create tunnels end in failure.
To successfully build a tunnel, two conditions are required:
  1. All nodes participating in the tunnel must be reachable at the transport layer by at least the previous node in the tunnel
  2. All nodes participating in the tunnel must agree to build a new tunnel. A node may refuse to create a tunnel, for example, due to its congestion

The maximum lifetime of a tunnel is 10 minutes; a tunnel can terminate its existence early if a node participating in the tunnel goes offline. Therefore, tunnel owners constantly send test messages to keep the list of “live” tunnels up to date..
So, the tunnels are available and the necessary LeaseSet is available. Now we can send an HTTP request and it will even reach the recipient, but we would also like to receive a response. To do this, we must indicate our own LeaseSet in our message, then the response will be sent to us through some incoming tunnel and most likely will safely reach our node. Since several connections can operate simultaneously through our node, each of them must either be assigned its own I2P address and formed a LeaseSet of several incoming tunnels, or a “shared” address must be created that multiplexes connections using a special protocol with the corresponding fields, which is a “wrapper” over application layer protocol. This protocol is called I2CP and the official I2P client uses it exclusively, although this is not necessary to build your own services. Of course, to access Flibusta you should use I2CP, since it is what it expects. However, to build, for example, your own torrent-like network, you can only get by with I2P addressing.

The I2CP protocol and the protocol stack built on top of it is a separate topic, which is covered in a separate article. article.
Tags:
Hubs:
Всего голосов 58: ↑55 и ↓3 +52
Комментарии 35
+35

Comments 35

«There is a list of known nodes, called netDb, which changes during operation; information about new nodes comes from other nodes. „
This means that these known nodes must have white IPs.?
No. These can be IP addresses of routers, certain ports on which are forwarded to the I2P router. This I2P router itself may be behind NAT.
In general, yes. Otherwise, an arbitrary router will not be able to forward any message to it if it cannot establish a connection with it, which sharply limits its possibilities. First of all, such a router will actually not be able to participate in the transmission of transit traffic; in addition, incoming tunnels can only be built through routers with which there is already a connection.
On the other hand, you can host, for example, your own website without having a “white” IP - properly installed tunnels for LeaseSet.
I would also like to know the answer to this question. When I came across this, I couldn’t understand where I should get the IV to encrypt the IV, until I realized that ECB was being used (by the way, this point was omitted in the official documentation). Moreover, the IV needs to be encrypted twice.
I got the impression that I2P was started by students trying to put into practice the knowledge gained in lectures, perhaps not always advisable.
In any case, when developing a client, we do not have the opportunity to change the protocol, but are forced to implement what we have in order to interact with other network nodes.
Can you explain in a little more detail, if it’s not difficult for you: “In this case, IV is encrypted with the same key twice: before encryption and after, this is called double encryption” - what is meant by “before encryption” and “after encryption "? IV is not part of the plaintext, how does IV before encryption differ from what after?
You have an IV that comes in the TunnelData message, which is part of the open text in the tunnel sense.
www.i2p2.de/tunnel_message_spec.html

You must, depending on the party, encrypt/decrypt it with your key for the IV, then use the resulting value as an IV to encrypt the content itself, then do the same with the IV again, then the message with the already reencrypted IV goes on.
It turns out that we initially have:
— KeyIV - key for IV
— KeyMessage - message key
— PlainIV - open IV
— PlainMessage - clear text

EncryptedTempIV = AES(plaintext = PlainIV, key = KeyIV, mode = ECB)
EncryptedMessage = AES(plaintext = PlainMessage, key = KeyMessage, iv = EncryptedTempIV, mode = CBC)
EncryptedIV = ???
FinalMessage = Pack(EncryptedIV, EncryptedMessage)

I see several options on how to get EncryptedIV:
a) AES(plaintext = EncryptedTempIV, key = KeyIV, mode = ECB)
b) AES(plaintext = EncryptedTempIV, key = KeyIV, iv = PlainIV, mode = CBC)
c) AES(plaintext = EncryptedTempIV, key = KeyMessage, mode = ECB)
d) fourth funny option

Regarding the first option (a) - are the authors absolutely sure that Encrypt(Encrypt(x)), where Encrypt(x) := AES(plaintext = x, key = SAME_KEY, mode = ECB) is even a good idea? An example immediately comes to mind with gamma and complete reversibility in the case of the same key - someone actually analyzed what happens to the data if they are encrypted twice with the same key?

Which option is actually used??
Option a) is used. Same key twice.
As far as I understand, this is how they tried to remove the IV that was transmitted openly.
I agree that the idea is stupid, how to put on two condoms at the same time.
I am only sure that this is the option used - otherwise other nodes would not be able to understand my messages, just as I would not be able to understand them..
A UFO flew in and published this inscription here
If you run a website, then the task of the intelligence agency is to determine the I2P address of your router to which your site is attached. To do this, you need to trace the route of one of your incoming tunnels, which only you, as the creator of this tunnel, knows. Participants only know the addresses of the next nodes in the tunnel. Since tunnels usually have 3 nodes, it is enough to visit these 3 participants. The only problem is that these nodes will most likely end up in different countries. But the fact that they can all belong to one owner (special service) is easy.
A UFO flew in and published this inscription here
The IPs of all routers are public, moreover, the current list can be downloaded from a number of ftp servers.
The server router, in principle, can find out the IP addresses of all nodes of the tunnel it has created, but in fact it does not need this - it only forms a chain of I2P addresses entering the tunnel and sends a message to the first of them through some of its outgoing tunnels.
In reality, you only need to know IP addresses to forward messages between tunnels.
A UFO flew in and published this inscription here
On the contrary, to build an outgoing tunnel you need an incoming one, because otherwise it will be impossible to know the result of creating the tunnel.
In fact, requests to create tunnels can be sent directly, but in order not to bother, they introduced the concept of “zero tunnels”, which begin and end on their router and are created automatically upon startup. It is through them that the first tunnels are created.

When you enter an address, the router will first find a 32-byte hash of the I2P address of your site at your address. Now having this number, you need to find some incoming tunnel by requesting its LeaseSet for this. To do this, you select what you think is a good router and send a request there with this address. There are 3 possible options here:
1. The router returned LeaseSet to you
2. The router told you that it doesn't know this and returns a list of other routers that it thinks know
3. Doesn't answer at all

In option 2, we try to send a request to the router that said. In option 3, we’ll try again using other tunnels, and possibly a different router.
When you have received the tunnel you need, you send and encrypt a message to the router with the key specified in the LeaseSet with the destination I2P address of the site you need.

A UFO flew in and published this inscription here
LeaseSet is always the same. The router simply holds the files and synchronizes them.
All routes are combined into one LeaseSet
forum.i2p forms the LeaseSet itself, choosing from the list of incoming tunnels which it considers necessary. Then he sends it to several routers, and they pass it on to each other.
What will happen if you try to represent yourself for someone else, I honestly don’t know. Interesting myself. Maybe it's their hole.
A UFO flew in and published this inscription here
A LeaseSet is a set of tunnels leading to a given destination. For one address, the LeaseSet is always the same; if another LeaseSet with the same address appears on the network, then the later one will be considered correct.

You request a LeaseSet through a tunnel, and if the router you are requesting turns out to belong to the intelligence services, then the only thing they can know is that someone is interested in this address, and they don’t know who exactly, because they don’t know where your outgoing tunnel comes from.
To do this, you select what you think is a good router and send a request there with this address.
Directly or through a chain?
Request LeaseSet only through a tunnel, otherwise they will quickly detect where you are going.
You can also request another router directly.
«Garlic". Passing messages or sequences between two finished random route nodes and tunnels


After this phrase, reading further became much more fun!
Thanks, fixed it.
There's a lot more fun stuff to come; "the original unencrypted message must be sequentially decrypted before it is sent" e.g..
It sounds funny, but that's exactly how it is. The unencrypted message follows decipher.
The encryption algorithms are as follows: There are two functions, Encrypt and Decrypt. They have the property: Encrypt(Decrypt(A)) = A and Decrypt(Encrypt(B)) = B. From this property it follows that in general it does not matter which of them actually “encrypts” and the second one “decrypts”, it is important that they work in pairs: if they encrypted the first, decrypt the second, and vice versa, if they encrypted the second, decrypt first. But for definiteness, one of them is still called Encrypt, the other Decrypt.

This is used not only in i2p. For example, the well-known 3DES (Triple DES) algorithm is actually DES applied three times according to the EDE scheme (each stage has its own subkey). Those. we encrypt using the DES Encrypt function with the first key, then we encrypt the received with the DES Decrypt function with the second key, and then we finally encrypt the received with the DES Encrypt function with the third key, and the result is the result of 3DES encryption. For decryption, respectively, DED with the same keys. Since all the keys are different and independent, the overall scheme is three times stronger than DES - 168 bits versus 56.
When I first started my project, the first thing I looked at was them. There was still a lot that had not been done there: at that time there were not even tunnels. Now they have added some, but so far only transit.
In general, I have a slightly different vision of the development of the project than theirs..
And you have everything posted on Github?

There, Log.h is included in some files, but I don’t see it in the repository.

And also, where is the entry point a la main.cpp?
Thank you for bringing this to your attention..
I’ll post Log.h and Queue.h now.
The correct main and Makefile are not yet made.
Added all missing files and updated outdated ones.
Put a temporary Makefile.
Everything should come together now.
install boost and crypto++ yourself from your Linux repository
For FreeBSD these are the devel/boost-all and security/cryptopp ports
I read the article and couldn’t help but draw analogies with relevant part Tor networks. Tor introduces two new roles for serving websites: introduction points and rendezvous points. I just can’t figure out who in I2P is an introduction point and who is a rendezvous point.
In I2P there is the concept of floodfills, these are nodes that have sufficiently complete information about the network, and, in addition, undertake the obligation to report all new information to other floodfills. In fact, these are some kind of “directories” of the network, telling anyone how to access this or that node. Conventional routers only deal with building tunnels and sending data through them.
Only full-fledged users can leave comments. Sign in, Please.